The study of the effect of training set on statistical language modeling

نویسندگان

  • Xipeng Shen
  • Bo Xu
چکیده

In this work, we make a study on the effect of training set on statistical language modeling (SLM). A corpus selection system based on perplexity is presented. It is tested in two experiments: one is to select optimal training corpus for generating a domain-specific SLM; the other one is for generating an optimal SLM for a LVCSR system. The results show that the training corpus is important for the capability of SLM and our corpus selection system is powerful for optimal corpus selection. With the help of this system, we generated a SLM for a LVCSR system, which contributed 14.5%--17.7% relative character error reduction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploratory-cumulative vs. Disputational Talk on Cognitive Dependency of Translation Studies: Intermediate level students in focus

The present study set out to determine the effect of implementing exploratory-cumulative talk in comparison to disputational talk on cognitive (meaning development and organization of thought as well as problem solving ability) dependency of intermediate level students in translation studies. In order to achieve the objectives of the study, a quasi-experimental-pretest-posttest-statistical stud...

متن کامل

The Effect of Applying Color and Light Training Materials on the Female First Grade Students’ Learning Outcome of Persian Language Lessons in Sharoud

The color images raise the level of educational by providing more detailed information; therefore, these images are believed to be effective in gaining a deeper understanding of the lessons. Moreover, Proper lighting enhances students’ learning and performance. This study aimed at assessing the impact of training color and light materials on elementary school girls’ attention and learning in Pe...

متن کامل

Investigating the Effect of the Training Program on Raters’ Oral Performance Assessment: A Mixed-Methods Study on Raters’ Think-Aloud Verbal Protocols

Although the use of verbal protocols is growing in oral assessment, research on the use of raters’ verbal protocols is rather rare. Moreover, those few studies did not use a mixed-methods design. Therefore, this study investigated the possible impacts of rater training on novice and experienced raters’ application of a specified set of standards in rating. To meet this o...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

The effect of leader motivational language on employee proactivity in Bouali Sina Petrochemical Co.

The present study aimed to investigate the effect of the leader's motivational language on employee pro-activity with the mediating role of psychological meaningfulness of job and organizational vitality of the individuals. This research was an applied research in terms of practical purpose, and descriptive-correlation in terms of data collection and it was a type of quantitative research. The ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001